-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SQLite client and converter #424
Conversation
There was a bug in `make_nullable` that caused UnionTypes to get double nested. I've updated the code to keep the UnionType flat. The code now adds a NullType if one isn't in the `types` attribute. It also sets `default` appropriate.
b5a525f
to
7ba2151
Compare
That's interesting. That means the types are different on write vs read. You can write a This reminds me of Kysely's |
Yea, gotta say I'm not a fan of SQLite's type system. And yes, you're correct, you can write "5" and get back 5. And this coercion is based on the type of the column, which can be anything! You can make the type |
That said, for the happy path, I think my converter should work just fine. The only caveat is date types are treated as "any" types. This is actually how SQLite handles it. I'm open to extending the converter to handle dates in a custom way, but would rather keep that as a separate PR. |
Recap can now read SQLite schemas as Recap types. SQLite's schema system is somewhat strange. Some notes: 1. Any column can store any type. 2. SQLite has 5 storage classes (null, int, real, text, blob). 3. STRICT forces column types to be the storage types. 4. non-STRICT tables allow any strings for column types. 5. non-STRICT column types are hints to coerce types as they're written to disk. 6. Parenthesis in types (e.g. DOUBLE(6, 2)) are ignored by SQLite. See https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes for more details. With all of this in mind, Recap's SQLiteConverter works according to SQLite's affinity rules. This means: 1. Unknown types are treated as "ANY", which is a union of all storage types. 2. SQLiteConverter pays attention to precision/scale for REAL, etc. 3. SQLiteConverter pays attention to lengths for VARCHAR(255), etc. 4. SQLiteConverter treats date, datetime, time, and timestamp as ANY types. Closes #418
@mjperrone This is ready for a review. 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! BTW the only info I got about potential use cases is that it would be v3.
|
||
def ls(self) -> list[str]: | ||
cursor = self.connection.cursor() | ||
cursor.execute("SELECT name FROM sqlite_schema WHERE type='table'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine also wanting to capture schema of views. Doesn't seem necessary for the first go at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, good point. I'll open a follow-on GH issue for that. I hope it's as easy as type in ('table', 'view')
😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recap/clients/sqlite.py
Outdated
row = self.add_information_schema(row) | ||
row = self.add_information_schema(row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this duplicated?
row = self.add_information_schema(row) | |
row = self.add_information_schema(row) | |
row = self.add_information_schema(row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I had two different functions but merged them into one i_s
function..
rows = [] | ||
|
||
for row_cells in cursor.fetchall(): | ||
row = dict(zip(names, row_cells)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3 python
} | ||
|
||
# Extract precision, scale, and octet length. | ||
numeric_pattern = re_compile(r"(\w+)\((\d+)(?:,\s*(\d+))?\)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPT helped me understand this matches strings that look like function calls. Might be helpful to point to some documentation on what TYPE
can look plike
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added docs.
cursor.execute( | ||
"SELECT name FROM sqlite_master WHERE type='table' AND name=?", (table,) | ||
) | ||
return bool(cursor.fetchone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This confuses me since above I saw row[0] for row in cursor.fetchall()
. If there is only one element SELECT
ed, is a tuple returned for that row, or just the value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content isn't as important as whether the row exists or not. I'm doing the _table_exists
check to prevent SQL injection attacks since you can't %s
or ?
parameterize pragma
calls. The SELECT is getting all rows that exactly match the table
string. If at least one row exists, then we assume the table exists.
The row[0] for row in cursor.fetchall()
is for listing all tables in the database. But it's not guaranteed that a user will call schema()
with a table from the ls()
command, so I wanted to guard against injection.
name TEXT NOT NULL, | ||
short_name CHAR(20) NOT NULL, | ||
variable_name VARCHAR(255) NOT NULL, | ||
descriptive_text CLOB NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLOB is a weird abbreviation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. And yet it's a thing. 😛
Python's `urlunparse` has this annoying behavior where it doesn't include double slashes if the netloc param is empty. This is annoying with URLs like `sqlite:///foo/bar/baz`. The parser would return `sqlite:/foo/bar/baz` in this scenario. I've updated the safe/unsafe methods in `RecapSettings` to account for this.
Merged! It's now available in 0.11.0! 😄 |
Awesome! You also might want to update the homepage compatibility table to include this 🥳 |
On the list for tomorrow! I want to add docs to the website first. |
Released in 0.12.0: |
Recap can now read SQLite schemas as Recap types. SQLite's schema system is
somewhat strange. Some notes:
See https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes for more
details.
With all of this in mind, Recap's SQLiteConverter works according to SQLite's
affinity rules. This means:
Closes #418